Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models

نویسندگان

  • Martins Mednis
  • Maike K. Aurich
چکیده

Increasing numbers of biochemical network models become available and reuse of these models is becoming more common. As a consequence, tools to compare models are needed. Comparison can be difficult because model builders often use different standards during reconstruction, metabolite formulas are not always indicated, IDs and names of metabolites are different, and models are stored in different formats (SBML, COBRA and others). Herein, a model comparison algorithm for SBML and COBRA format models is presented, called ModeRator. Precondition for correct matching of reactions is the comparison of the participating metabolites. ModeRator is based on the comparison of metabolite names as text strings. An automatic three level filtering approach is implemented in the software, which rejects pairs of potentially equal metabolites and builds an opinion about metabolite pairs with high similarity in metabolite names. ModeRator was applied to two test cases, comparing two models of each, E.coli and S.cerevisiae. Matches of the automatic mapping were manually inspected and compared with the automatic predictions. Automatic metabolite mapping of E.coli models (1314 and 1704 metabolites) comparing only identifiers revealed a high number of accordant metabolites. Both models originate from the same source (BioCyc database). No significant difference between automatic mapping and manual curation are observed. For the comparison of two S.cerevisiae models (679 and 1061 metabolites), three level filtration by metabolite name is used. The discrepancy between manual curated predictions and ModeRator predictions was 7%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic comparison of metabolites names: impact of criteria thresholds

The growing number of stoichiometric reconstructions and models tends to change the model building process. Instead of creating a new model from scratch scientists can look at the earlier created relevant models to assess the opinion and consensus level of other modellers. Several initiatives have been performed to build consensus models for particular organisms following this approach. One of ...

متن کامل

Biosystems and Information Technology (2013)

The growing number of stoichiometric reconstructions and models tends to change the model building process. Instead of creating a new model from scratch scientists can look at the earlier created relevant models to assess the opinion and consensus level of other modellers. Several initiatives have been performed to build consensus models for particular organisms following this approach. One of ...

متن کامل

Learning String Edit Distance

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn the optimal string edit dis...

متن کامل

Learning String Edit Distance 1

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance func...

متن کامل

Melody Recognition with Learned Edit Distances

In a music recognition task, the classification of a new melody is often achieved by looking for the closest piece in a set of already known prototypes. The definition of a relevant similarity measure becomes then a crucial point. So far, the edit distance approach with a-priori fixed operation costs has been one of the most used to accomplish the task. In this paper, the application of a proba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013